Search CORE

801 research outputs found

Large Scale Genomic Sequence SVM Classifiers

Author: Rätsch G.
Schölkopf B.
Sonnenburg S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2005
Field of study

In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences

MPG.PuRe

The Feature Importance Ranking Measure

Author: A. Graf
B. Schölkopf
B. Üstün
C. Strobl
C. Strobl
G. Rätsch
G.R.G. Lanckriet
J. Friedman
J. Schäfer
K. Bennett
M. Laan van der
R. Tibshirani
S. Sonnenburg
S. Sonnenburg
Publication venue
Publication date: 01/01/2009
Field of study

Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often resorts to linear models in combination with variable selection, thereby sacrificing some predictive power for presumptive interpretability. Here, we introduce the Feature Importance Ranking Measure (FIRM), which by retrospective analysis of arbitrary learning machines allows to achieve both excellent predictive performance and superior interpretation. In contrast to standard raw feature weighting, FIRM takes the underlying correlation structure of the features into account. Thereby, it is able to discover the most relevant features, even if their appearance in the training data is entirely prevented by noise. The desirable properties of FIRM are investigated analytically and illustrated in simulations.Comment: 15 pages, 3 figures. to appear in the Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 200

arXiv.org e-Print Archive

A Unifying View of Multiple Kernel Learning

Author: A. Rakotomamonjy
B. Schölkopf
B. Schölkopf
C. Zhu
F.R. Bach
G.R.G. Lanckriet
H. Zou
K.-R. Müller
M. Kloft
P.L. Bartlett
R.M. Rifkin
R.T. Rockafellar
S. Sonnenburg
V.N. Vapnik
Publication venue
Publication date: 01/01/2010
Field of study

Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

mGene.web: a web service for accurate computational gene finding

Author: A. Zien
Bernal
Besemer
Brent
C. S. Ong
G. Ratsch
G. Schweikert
G. Zeller
J. Behr
S. Sonnenburg
Salamov
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp)

Edinburgh Research Explorer

MPG.PuRe

mGene.web: a web service for accurate computational gene finding

Author: A. Zien
Bernal
Besemer
Brent
C. S. Ong
G. Ratsch
G. Schweikert
G. Zeller
J. Behr
S. Sonnenburg
Salamov
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

MPG.PuRe

Evaluation of antigens for the serodiagnosis of kala-azar and oriental sores by means of the indirect immunofluorescence antibody test (IFAT)

Author: A. Zuckermann
E. C. Hedge
E. Mannweiler
F. Falkner v. Sonnenburg
G. Piekarski
G. Weiland
Gh. H. Endrissian
H. E. Krampitz
H. E. Krampitz
H. E. Krampitz
J. J. Shaw
J. Ranque
L. Prüfer
M. Lopez-Brea
N. Beforouz
R. S. Bray
R. S. Bray
R. S. Bray
T. I. Aljeboori
Th. Löscher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1981
Field of study

Antigens and corresponding sera were collected from travellers with leishmaniasis returning to Germany from different endemic areas of the old world. The antigenicity of these Leishmania strains, which were maintained in Syrian hamsters, was compared by indirect immunofluorescence (IFAT). Antigenicity was demonstrated by antibody titres in 18 sera from 11 patients. The amastigotic stages of nine strains of Leishmania donovani and four strains of Leishmania tropica were compared with each other and with the culture forms of insect flagellates (Strigomonas oncopelti and Leptomonas ctenocephali). Eighteen sera from 11 patients were available for antibody determination with these antigens. The maximal antibody titres in a single serum varied considerably depending on which antigen was used for the test. High antibody levels could only be maintained when Leishmania donovani was employed as the antigen, but considerable differences also occurred between the different strains of this species. The other antigens were weaker. No differences in antigenicity between amastigotes and promastigotes of the same strain were observed. It is important to select suitable antigens. Low titres may be of doubtful specificity and are a poor baseline for the fall in titre which is an essential index of effective treatment.Wir sammelten Parasiten und Seren von Reisenden, die aus verschiedenen endemischen Gebieten der Alten Welt mit einer Leishmaniasis nach Deutschland zurückkehrten. Die Antigenaktivitäten der isolierten und fortlaufend in Goldhamstern gehaltenenLeishmania-Stämme wurden im indirekten Immunofluoreszenztest (IFAT) verglichen. Die Antigenität wurde an Hand von Antikörpertitern in 18 Serumproben von 11 Patienten bewiesen. Neun Stämme desLeishmania donovani-Komplexes und vierLeishmania tropica-Isolate wurden in ihrem amastigoten Stadium miteinander verglichen. Hinzu kamen zwei Insekten-Flagellaten als Kulturformen:Strigomonas oncopelti undLeptomonas ctenocephali. 18 Serumproben von 11 Patienten standen für die Antikörperbestimmung mit diesen Antigenen zur Verfügung. Die maximalen Titerhöhen variierten in ein- und derselben antiserumprobe zum Teil erheblich, je nachdem, welches Antigen für den Test benutzt wurde. Hohe Antikörpertiter konnten nur erhalten werden, wennLeishmania donovani als Antigen vorlag, es ergaben sich aber auch zwischen den einzelnen Stämmen dieser Leishmaniaart erhebliche Unterschiede in der Antigenaktivität. Antigene anderer Art erwiesen sich als wenig wirksam. Zwischen amastigoten und promastigoten Entwicklungsformen einesLeishmania donovani-Stammes konnten keine Unterschiede in der Antigenaktivität erkannt werden. Für den Nachweis möglichst hoher Antikörpertiter im IFAT ist die Auswahl geeigneter Antigene von ausschlaggebender Bedeutung. Niedrige Titer erschweren deren Beurteilung als spezifisch und sind eine schlechte Ausgangsposition für die Beobachtung des obligatorischen Titerabfalles nach erfolgreicher Therapie

Crossref

Open Access LMU

Probabilistic Clustering of Time-Evolving Distance Data

Author: AK Jain
AY Ng
C Leslie
CP Robert
D Blei
DD Lee
DM Blei
Gunnar Rätsch
H Saigo
J Pitman
Julia E. Vogt
M Bilodeau
Marius Kloft
MB Eisen
MS Srivastava
P McCullagh
P McCullagh
RM Neal
S Sonnenburg
Sandhya Prabhakaran
SN MacEachern
Stefan Stark
Sudhir S. Raman
SVN Vishwanathan
TS Ferguson
TW Anderson
Volker Roth
WJ Ewens
Publication venue
Publication date: 01/01/2015
Field of study

We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance -- they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time

arXiv.org e-Print Archive

Crossref

edoc

Exploiting physico-chemical properties in string kernels

Author: B Peters
B Shen
C Leslie
C Leslie
C Leslie
Christian Widmer
CS Ong
CS Ong
CW Tung
G Rätsch
G Rätsch
G Schweikert
Gunnar Rätsch
H Rangwala
H Saigo
J Weston
L Jacob
M Röttig
M Venkatarajan
N Pfeifer
Nora C Toussaint
Oliver Kohlbacher
R Kuang
RM Clark
S Henikoff
S Kawashima
S Sonnenburg
S Sonnenburg
SJ Schultheiss
V Roth
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas. Results We propose new string kernels that combine the benefits of physico-chemical descriptors for amino acids with the ones of string kernels. The benefits of the proposed kernels are assessed on two problems: MHC-peptide binding classification using position specific kernels and protein classification based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels. Conclusions In summary, the proposed modifications, in particular the combination with the RBF substring kernel, consistently yield improvements without affecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference. Availability Data sets, code and additional information are available from <url>http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask</url>. Implementations of the developed kernels are available as part of the Shogun toolbox.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Support Vector Machines and Kernels for Computational Biology

ISSN:1553-734XISSN:1553-735

Repository for Publications and Research Data

Crossref

Fraunhofer-ePrints

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Efficient Training of Graph-Regularized Multitask SVMs

Author: A. Torralba
C. Cortes
D. Bertsekas
K.R. Müller
M. Kloft
R. Fan
R.M. Rifkin
S. Sonnenburg
S. Sonnenburg
T. Evgeniou
T. Joachims
T.W.T.C.C. Consortium
W. Samek
Y. Xue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an optimization framework for graph-regularized multi-task SVMs based on the primal formulation of the problem. Previous approaches employ a so-called multi-task kernel (MTK) and thus are inapplicable when the numbers of training examples n is large (typically n < 20,000, even for just a few tasks). In this paper, we present a primal optimization criterion, allowing for general loss functions, and derive its dual representation. Building on the work of Hsieh et al. [1,2], we derive an algorithm for optimizing the large-margin objective and prove its convergence. Our computational experiments show a speedup of up to three orders of magnitude over LibSVM and SVMLight for several standard benchmarks as well as challenging data sets from the application domain of computational biology. Combining our optimization methodology with the COFFIN large-scale learning framework [3], we are able to train a multi-task SVM using over 1,000,000 training points stemming from 4 different tasks. An efficient C++ implementation of our algorithm is being made publicly available as a part of the SHOGUN machine learning toolbox [4]

Crossref

MPG.PuRe